Data-driven Natural Language Generation: Paving the Road to Success
نویسندگان
چکیده
We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation metrics and motivating the need for a new, more reliable metric. The second problem is addressed by presenting a novel framework for developing and evaluating a high quality corpus for NLG training. 1 Evaluation metrics for NLG Up to 60% of NLG research published between 2012–2015 relies on automatic evaluation measures, such as BLEU (Gkatzia and Mahamood, 2015). The use of such metrics is, however, only sensible if they are known to be sufficiently correlated with human preferences, which is not the case, as we show in the most complete study to date, across metrics, systems, datasets and domains. We evaluate three end-to-end NLG systems: RNNLG (Wen et al., 2015), TGen (Dušek and Jurčı́ček, 2015) and LOLS (Lampouras and Vlachos, 2016), using a large number of 21 automated metrics. The metrics are divided into groups of word-based metrics (WBMs, such as TER (Snover et al., 2006), BLEU (Papineni et al., 2002), ROUGE (Lin, 2004), semantic similarity (Han et al., 2013) etc.) and grammar-based metrics (GBMs, such as readability, characters per utterance and per word, syllables per sentence and per word, number of misspellings etc.). To assess the metrics’ Lexical richness Syntactic complexity Dataset LS MSTTR Level 0-1 Level 6-7 our corpus 0.57 0.75 46% 16% SFRest 0.43 0.62 47% 13% SFHot 0.43 0.59 51% 15% Bagel 0.42 0.41 50% 16% Table 1: Lexical richness and syntactic variation for the collected corpus and other popular datasets. LS measures the proportion of less frequent words in the text, MSTTR measures the type-token ratio normalised by the size of the corpus. For D-level complexity, Level 0-1 include syntactically simple sentences, Level 6-7 include the most complicated sentences. reliability, we calculate the Spearman correlation between the metrics and human ratings for the same natural language (NL) utterances, the accuracy of relative rankings and conduct a detailed
منابع مشابه
Automatic Generation of a Multi Agent System for Crisis Management by a Model Driven Approach
Considering the increasing occurrences of unexpected events and the need for pre-crisis planning in order to reduce risks and losses, modeling instant response environments is needed more than ever. Modeling may lead to more careful planning for crisis-response operations, such as team formation, task assignment, and doing the task by teams. A common challenge in this way is that the model shou...
متن کاملThe Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context
The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...
متن کاملThe Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context
The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...
متن کاملContext-Sensitive Natural Language Generation: From Knowledge-Driven to Data-Driven Techniques
Context-sensitive Natural Language Generation is concerned with the automatic generation of system output that is in several ways adaptive to its target audience or the situational circumstances of its production. In this article, I will provide an overview of the most popular methods that have been applied to context-sensitive generation. A particular focus will be on the shift from knowledge-...
متن کاملINVESTIGATING L2 TEACHERS’ PEDAGOGICAL SUCCESS: THE ROLE OF SPIRITUAL INTELLIGENCE
Teachers can influence the complex process of learning in education, in general, and in second/foreign language (L2) learning in particular. In this light, understanding the factors influencing teachers’ pedagogical success can help L2 teachers achieve more effective teaching. This study then investigated the role of spiritual intelligence (SI) in L2 teachers’ pedagogical success. In so doing, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1706.09433 شماره
صفحات -
تاریخ انتشار 2017